SlideShare a Scribd company logo
1 of 57
Fundamentals ofData Mining for Marketers James R. Stafford
Today’s Agenda - 7 Steps to Better Models Identify the business problem Data audit - what data types are most useful and how much do I need? Exploratory data analysis Data quality - how to deal with missing data and outliers Identifying your most predictive variables Transforming your variables Choose the best modeling approach Make sure the model makes sense Model validation - the Melatonin of modeling When to re-build your model
What is the business problem? What Should I      Predict? ,[object Object]
Attrition/Lapse/Churn
Reactivation
Lifetime Value
Profitability
Sales,[object Object]
Secondary data,[object Object]
Frequency
Monetary
Products PurchasedThe most important data for modeling!
Primary Data ,[object Object]
Recency
Frequency
Monetary
Products Purchased
Demographics
Age
Home Ownership
DependentsLifestyle Type of Car Hobbies Travel Preferences The most important data for customer profiling and building acquisition models!
Secondary data: consumer & business ,[object Object]
  Age
  Home Ownership
  Dependents
  Income
Type of Car
Travel Preferences
SIC
Employees
Acquired from another source
Specific or inferred
Actual and reported by individual/household
Modeled after similar profiles
Pct data specific or inferred varies
Costs vary from $2/1,000 to $50/1,000 matches,[object Object]
When to sample
too many records
test campaign to get response
withhold some for model validation
Goal of sample -  to be representative of your target customer population,[object Object]
C = confidence level (1.96 for 95% confidence)
E = acceptable error bound        (0.001 =   0.1% response rate)
P = response rate from full file (e.g., 0.03=3%)
Q = (1-P),[object Object]
How much data do I need? --10,000 records & 3.0% RR 3.0% 3.3% 2.7% (95 times out of 100!)
What’s the minimum sample size I need to get 2.9 % <=> 3.1%? 3.0% 3.1% 2.9% 112,000 (95 times out of 100!)
How much data do I need? Minimums ,[object Object]
Lifetime value models - at least 300 customers/records.,[object Object]
Use/recode: -999, may be meaningful, e.g., lots of missing data can be important in fraud detection
Substitute - mean, median or mode
Delete records from analysis
Outliers - data outside of reasonable bounds
customer age = 170
customer balance = $1.5M ($10,000 = other max value)
identify with plots: frequency distributions/histograms
Use, substitute or delete,[object Object]

More Related Content

Similar to Fundamentals Of Data Mining 2010

How important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfoliosHow important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfoliosRalph Goldsticker
 
How important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfoliosHow important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfoliosRalph Goldsticker
 
Ppt常用图表库
Ppt常用图表库Ppt常用图表库
Ppt常用图表库liaohanbin
 
Technology-Presentation_Qualcomm_Intel
Technology-Presentation_Qualcomm_IntelTechnology-Presentation_Qualcomm_Intel
Technology-Presentation_Qualcomm_IntelJason Wyman
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docx
3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docx3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docx
3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docxtamicawaysmith
 
Using Customer Data to Build Intimacy, Engagement, and Loyalty
Using Customer Data to Build Intimacy, Engagement, and LoyaltyUsing Customer Data to Build Intimacy, Engagement, and Loyalty
Using Customer Data to Build Intimacy, Engagement, and LoyaltyStatistics Solutions
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupKen Tucker
 
KDD capabilities 2016 v1.0
KDD capabilities 2016 v1.0KDD capabilities 2016 v1.0
KDD capabilities 2016 v1.0KDDanalytics
 
Servers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docx
Servers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docxServers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docx
Servers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docxbagotjesusa
 
50-AAPL-Buyside-Pitchbook.ppt
50-AAPL-Buyside-Pitchbook.ppt50-AAPL-Buyside-Pitchbook.ppt
50-AAPL-Buyside-Pitchbook.pptDanielYang700061
 
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand PatternsOptimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand PatternsG3 Communications
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationLewandog, Inc,
 
Risk Analysis for Strategic Decisions – Some Useful Tools
Risk Analysis for Strategic Decisions – Some Useful ToolsRisk Analysis for Strategic Decisions – Some Useful Tools
Risk Analysis for Strategic Decisions – Some Useful ToolsJerry Boger
 
Retail Design
Retail DesignRetail Design
Retail Designjagishar
 
Marketing Project Presentation
Marketing Project PresentationMarketing Project Presentation
Marketing Project PresentationAllison Caparros
 
Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015 Dataiku
 

Similar to Fundamentals Of Data Mining 2010 (20)

How important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfoliosHow important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfolios
 
How important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfoliosHow important are the rules used to create smart beta portfolios
How important are the rules used to create smart beta portfolios
 
Ppt常用图表库
Ppt常用图表库Ppt常用图表库
Ppt常用图表库
 
Technology-Presentation_Qualcomm_Intel
Technology-Presentation_Qualcomm_IntelTechnology-Presentation_Qualcomm_Intel
Technology-Presentation_Qualcomm_Intel
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docx
3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docx3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docx
3818, 10(44 AMABC Company M3.png 1,132×1,628 pixelsPage .docx
 
Using Customer Data to Build Intimacy, Engagement, and Loyalty
Using Customer Data to Build Intimacy, Engagement, and LoyaltyUsing Customer Data to Build Intimacy, Engagement, and Loyalty
Using Customer Data to Build Intimacy, Engagement, and Loyalty
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
 
Introduction To Six Sigma
Introduction To  Six  SigmaIntroduction To  Six  Sigma
Introduction To Six Sigma
 
KDD capabilities 2016 v1.0
KDD capabilities 2016 v1.0KDD capabilities 2016 v1.0
KDD capabilities 2016 v1.0
 
Servers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docx
Servers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docxServers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docx
Servers @ IT SNServer NameIP AddressBrandModelRack 31SVLAPP0110.1.docx
 
IP Spotlight: Patents, Technology and M&A
IP Spotlight: Patents, Technology and M&AIP Spotlight: Patents, Technology and M&A
IP Spotlight: Patents, Technology and M&A
 
50-AAPL-Buyside-Pitchbook.ppt
50-AAPL-Buyside-Pitchbook.ppt50-AAPL-Buyside-Pitchbook.ppt
50-AAPL-Buyside-Pitchbook.ppt
 
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand PatternsOptimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick Implementation
 
Risk Analysis for Strategic Decisions – Some Useful Tools
Risk Analysis for Strategic Decisions – Some Useful ToolsRisk Analysis for Strategic Decisions – Some Useful Tools
Risk Analysis for Strategic Decisions – Some Useful Tools
 
Retail Design
Retail DesignRetail Design
Retail Design
 
Marketing Project Presentation
Marketing Project PresentationMarketing Project Presentation
Marketing Project Presentation
 
Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 

More from Jim Stafford

Marketing Automation and Eloqua - Stafford
Marketing Automation and Eloqua - StaffordMarketing Automation and Eloqua - Stafford
Marketing Automation and Eloqua - StaffordJim Stafford
 
Eloqua B2B Marketing Automation
Eloqua  B2B Marketing AutomationEloqua  B2B Marketing Automation
Eloqua B2B Marketing AutomationJim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Jim Stafford
 

More from Jim Stafford (8)

Stafford Visually
Stafford VisuallyStafford Visually
Stafford Visually
 
Marketing Automation and Eloqua - Stafford
Marketing Automation and Eloqua - StaffordMarketing Automation and Eloqua - Stafford
Marketing Automation and Eloqua - Stafford
 
Eloqua - Stafford
Eloqua - StaffordEloqua - Stafford
Eloqua - Stafford
 
Eloqua - Stafford
Eloqua - StaffordEloqua - Stafford
Eloqua - Stafford
 
Eloqua B2B Marketing Automation
Eloqua  B2B Marketing AutomationEloqua  B2B Marketing Automation
Eloqua B2B Marketing Automation
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 

Fundamentals Of Data Mining 2010